Skip to content

Introduce datatable.unique.names policy for duplicate handling in setnames() #4044#7674

Open
venom1204 wants to merge 3 commits intomasterfrom
issue4044
Open

Introduce datatable.unique.names policy for duplicate handling in setnames() #4044#7674
venom1204 wants to merge 3 commits intomasterfrom
issue4044

Conversation

@venom1204
Copy link
Copy Markdown
Contributor

Closes #4044

Hi @ben-schwen,

Apologies for opening a new PR—there was an issue with the previous one. I’ve incorporated the requested changes, while keeping the rest unchanged.

Kindly review it when you have the time. Also, sorry for the delay; I was occupied with other commitments.

Thank you!

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.04%. Comparing base (7db13b9) to head (471ab20).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #7674   +/-   ##
=======================================
  Coverage   99.04%   99.04%           
=======================================
  Files          87       87           
  Lines       17031    17049   +18     
=======================================
+ Hits        16868    16886   +18     
  Misses        163      163           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 20, 2026

No obvious timing issues in HEAD=issue4044
Comparison Plot

Generated via commit 471ab20

Download link for the artifact containing the test results: ↓ atime-results.zip

Task Duration
R setup and installing dependencies 2 minutes and 53 seconds
Installing different package versions 23 seconds
Running and plotting the test cases 4 minutes and 3 seconds

@venom1204 venom1204 requested a review from ben-schwen April 14, 2026 20:47
Comment thread inst/tests/tests.Rraw
test(2367.6, fread(file(f)), data.table(), warning="Connection has size 0.")
unlink(f)

#4044
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please dont only write the issue number, but what issue is fixed. this provides information without needing to search for the issue on github.

The tests seem ok but an example of DT = data.table(a=1:2, b=c('x', 'y')) would have been easier to grasp (and also less typing/code)

Comment thread R/utils.R

process_name_policy = function(names_vec) {
policy = getOption("datatable.unique.names")
if (is.null(policy) || policy == "off") return(names_vec)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no ultra happy with this double option for "off". I would either eliminate NULL or "off" (leaning towards eliminating NULL though)

Comment thread man/data.table-options.Rd
\item{\code{datatable.enlist}}{Experimental feature. Default is \code{NULL}. If set to a function
(e.g., \code{list}), the \code{j} expression can return a \code{list}, which will then
be "enlisted" into columns in the result.}
\item{\code{datatable.unique.names}}{A character string, default \code{NULL} (same as \code{"off"}).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should also document the other options, e.g. warn, error etc.

Comment thread R/utils.R
msg = paste0("Duplicate column names created: ", brackify(dups), ". This may cause ambiguity.")

switch(policy,
warn = warningf(msg),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment above from previous review. this will trip with

DT = data.table("a%d" = 1, b = 2)
options(datatable.unique.names = "warn")
setnames(DT, "b", "a%d")
# Error in sprintf(gettext(fmt, domain = domain, trim = trim), ...) : 
#   too few arguments

Hence, you need to pass format strings like warningf("%s", msg)

Comment thread NEWS.md

5. `tables()` can now optionally report `data.table` objects stored one level deep inside list objects when `depth=1L`, [#2606](https://github.com/Rdatatable/data.table/issues/2606). Thanks @MichaelChirico for the report and @manmita for the PR

6. `setnames()` now supports a global option `datatable.unique.names` to control the creation of duplicate column names. Users can choose between `"off"` (default), `"warn"`, `"error"`, or `"rename"`. This addresses long-standing ambiguity issues when duplicate names were created silently, [#4044](https://github.com/Rdatatable/data.table/issues/4044). Thanks to @venom1204 for the PR.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think we need this part since its superfluous "This addresses long-standing ambiguity issues when duplicate names were created silently"

Comment thread R/utils.R
table_name, brackify(duplicate_names), domain=NA)
}

process_name_policy = function(names_vec) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also since we only use it with setnames it should probably live inside data.table.R in front of setnames

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Should setnames() produce duplicate column names without warning?

2 participants